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CHAPTER  I 


INTRODUCTION 


Classification  of  an  unknown  target  from  radar  return  signals  by  means  of 
sequential  hypothesis  testing  techniques  is  the  subject  of  this  study.  A  radar 
target  identification  (RTI)  system  is  to  be  designed  to  distinguish  measurements 
of  the  radar  backscatter  from  an  unknown  object  as  belonging  to  one  of  a  set  of 
M  classes,  with  each  class  corresponding  to  a  particular  airborne  radar  target. 

The  classification  of  targets  observed  by  a  radar  or  other  sensor  is  of  prime 
importance  in  ballistic  missile  defense  and  other  similar  problems.  Targets  are 
(at  least  conceptually)  examined  one  at  a  time  and  classified  based  on  the  time 
ordered  returns  from  a  set  of  consecutively  transmitted  pulses.  In  this  report  an 
observation  is  considered  to  be  a  sample  waveform  from  a  random  process,  and 
the  classification  algorithm  or  classifier  is  based  on  sequential  statistical  hypothesis 
testing. 

The  radar  system  to  be  used  is  a  general  purpose,  multifrequency,  multipo¬ 
larization  system.  It  operates  in  the  range  from  8  to  58  MHz  in  horizontal  receive 
mode  (see[l]).  For  the  targets  of  interest,  these  frequencies  represent  the  resonant 
region  of  a  catalogue  of  radar  targets  which  axe  used  for  the  experimental  phases 
of  the  study  [2].  The  resonant  region  corresponds  to  the  band  of  frequencies  with 
wavelengths  which  are  approximately  equal  to  the  dimension  of  the  target. 

A  reasonable  design  goal  for  a  target  identification  system  is  to  realize  an  al- 


gorithm  that  is  capable  of  producing  a  reliable  decision  with  as  few  measurements 
as  possible.  It  has  long  been  recognized  [3,4]  that  sequential  hypothesis  testing 
techniques  provide  a  reasonable  compromise  in  the  tradeoff  between  the  classifica¬ 
tion  error  rate  and  the  average  number  of  measurements,  E{n},  required  to  reach 
a  decision. 

The  theory  of  sequential  hypothesis  testing  was  developed  for  the  binary  (two 
hypotheses)  case  by  Wald  [3].  Since  that  time,  this  theory  has  had  a  wide  variety 
of  applications.  For  the  application  to  target  identification,  the  target  returns  are 
observed  in  stages.  At  each  stage  a  decision  is  made  either  to  classify  the  target 
(as  a  particular  object  type)  or  to  make  another  observation. 

For  the  binary  case,  the  sequential  classification  procedure  is  optimal  in  the 
sense  that  a  classification  is  made  and  the  decision  sequence  ends  with  the  min¬ 
imum  number  of  returns  necessary  to  achieve  a  prescribed  probability  of  error 
[5].  Sequential  classifiers  provide  important  advantages  over  those  classifiers  that 
employ  a  predetermined,  fixed  number  of  return  measurements.  Targets  that  are 
easy  to  identify  are  classified  quickly,  while  targets  that  are  more  difficult  to  iden¬ 
tify  can  be  observed  for  a  longer  period  of  time  before  reaching  a  decision.  This 
results  in  a  more  efficient  use  of  the  sensor  and  of  computational  resources  and 
an  overall  improvement  in  the  classification  performance.  Although  most  of  the 
important  results  from  the  theory  of  sequential  testing  do  not  require  statistical  in¬ 
dependence  between  successive  observations,  independence  of  observations  greatly 
simplifies  the  design  and  analysis  of  the  performance  of  the  sequential  test. 

Since  the  original  work  of  Wald  [3],  a  number  of  techniques  have  been  proposed 
that  extend  binary  hypothesis  testing  methods  to  the  case  of  M  >  3  alternatives 
[0,7,4, 8].  Each  of  these  techniques  realize  some  performance  characteristic  that 
may  be  desirable  for  certain  applications.  In  addition  to  the  classification  error 


rate  and  the  average  number  of  measurements,  these  characteristics  may  include 
the  maximum  number  of  measurements  allowed,  the  complexity  of  implementa¬ 
tion,  and  the  sensitivity  to  noise  power  levels.  The  relative  importance  of  these 
performance  parameters  depends  on  the  application. 

The  present  consideration  of  M- ary  sequential  techniques  is  primarily  moti¬ 
vated  by  interest  in  the  reliable  identification  of  aircraft.  For  this  particular  ap¬ 
plication,  each  measurement  X  is  a  vector  whose  component  x i  =  1 ,K  are 
complex  numbers  representing  the  in-phase  and  quadrature  parts  of  the  backscat- 
ter  signal  at  a  particular  frequency, 

The  target,  whose  identity  is  unknown,  may  be  at  unknown  azimuth  and 
elevation  relative  to  the  radar.  In  cases  where  the  aspect  angle  (azimuth  and 
elevation)  of  the  object  are  known,  the  M  hypotheses  may  be  regarded  as  “simple” ; 
each  hypothesis  corresponding  to  a  target  class  described  by  a  single  prototype. 
When  the  aspect  angle  of  the  object  is  unknown,  or  known  to  be  within  some 
range  of  angles,  the  decision  must  be  made  among  M  composite  hypotheses;  each 
hypothesis  corresponding  to  a  target  class  containing  Ns  prototypes  that  represent 
the  possible  aspect  angles  in  the  specified  range. 

For  experimentation  purposes,  simulated  radar  returns  were  obtained  from 
the  Ohio  State  University  compact  radar  range  as  discussed  in  [9].  The  compact 
range  data  has  been  normalized  so  that  all  system  related  parameters  have  been 
removed  from  the  measurements.  The  compact  range  data  is  in  units  of  dJ5m2 
which  is  the  radar  cross  section  of  the  target,  relative  to  1  square  meter.  This  unit 
of  measurement  is  also  used  to  describe  the  average  power  of  the  noise  used  in  the 
simulation. 

Figure  1  shows  the  structure  of  a  classifier  designed  to  identify  radar  returns 
where  observations  are  taken  sequentially.  The  classifier  includes  three  main  steps: 


used.  This  means  that  changing  features  at  each  stage  of  the  sequential  test  is 
a  change  in  the  frequency  elements  measured  whenever  a  new  observation  is  re¬ 
quested.  Thus,  feature  ordering  for  radar  target  identification  is  done  by  selecting 
an  optimum  set  of  frequencies  at  each  stage  of  the  sequential  test.  Vectors  of 
observations  that  correspond  to  different  set  of  frequencies  at  each  stage  of  the 
sequential  test  describe  the  target  in  a  better  way  than  those  utilizing  the  same 
frequencies  through  the  entire  test  [10].  If  the  sequential  test  involves  class  rejec¬ 
tion,  then  the  optimum  set  of  frequencies  depends  on  the  nature  and  number  of 
classes  that  are  not  rejected.  The  optimum  set  of  frequencies  include  those  that 
give  the  most  “recognizable”  returns  when  applied  to  different  type  of  targets. 

In  this  study,  the  set  of  frequencies  used  is  an  optimum  set  obtained  by  ap¬ 
plication  of  feature  selection  algorithms  [10].  The  sequential  observations  are  in 
fact  utilizing  repeated  measurements  of  the  same  set  of  optimum  frequencies,  but 
the  noise  content  of  each  measurement  differs  from  stage  to  stage.  Repeating  the 
same  observations  many  times  results  in  an  effective  improvement  in  the  signal  to 
noise  ratio  [11].  As  more  observations  are  taken  and  the  classifier  makes  use  of 
the  previous  measurements,  the  test  eventually  reaches  a  stage  where  a  reliable 
decision  can  be  declared. 

1.1  Purpose  of  the  Study 

The  optimal  classifier  is  to  be  designed  such  that  it  requires  the  least  possible 
number  of  observations  given  a  fixed  probability  of  misclassification.  To  achieve 
this  goal,  an  investigation  of  the  algorithms  that  have  already  been  proposed  in 
literature  is  carried  out  and  a  comparison  between  these  algorithms  is  made  based 
on  their  performance. 

Modifications  of  the  techniques  are  suggested  as  a  measure  to  improve  their 


performance  tradeoff  between  the  expected  number  of  observations  £?{n}  and  the 
probability  of  misclassification.  Ways  of  converting  some  techniques  that  require 
a  predetermined  fixed  number  of  observations  into  sequential  techniques  are  also 
suggested  in  this  study. 

This  study  considers  the  case  where  complete  a  priori  knowledge  of  the  statis¬ 
tics  of  the  random  observations  is  available  and  the  case  where  such  a  knowledge 
is  not  complete.  The  former  case  is  known  as  “parametric”  and  the  latter  as 
“nonparametric” . 

The  concept  of  sequential  hypotheses  test  for  the  binary  case,  as  introduced  by 
Wald,  is  discussed  in  Chapter  II.  Several  generalizations  of  this  technique  for  the 
M- ary  case  and  some  modifications  of  these  techniques  are  discussed  in  Chapter  III. 
An  M -ary  sequential  classification  technique  based  on  a  tree  structured  algorithm 
is  proposed  in  Chapter  III.  In  Chapter  IV,  a  sequential  version  of  the  nonparametric 
nearest-neighbor  (NN)  method  of  pattern  recognition  is  considered. 

The  performance  of  the  various  techniques  is  evaluated  in  Chapter  V  by  means 
of  computer  simulation  studies.  In  obtaining  the  results  for  Chapter  V,  the  radar 
signals  are  simulated  using  a  set  of  stored  reference  patterns  of  five  different  com¬ 
mercial  aircraft,  each  corresponding  to  a  class  containing  vector  prototypes  rep¬ 
resenting  observations  of  a  particular  aircraft  at  up  to  nineteen  different  azimuth 
angles.  (See  [1]  for  a  discussion  of  the  generation  and  characteristics  of  the  aircraft 
catalog  database.) 


CHAPTER  II 


The  Concept  of  Sequential  Hypotheses  Testing 

A  sequential  test  is  an  adaptive  procedure  to  decide  among  two  or  more  al¬ 
ternate  hypotheses,  where  observations  are  taken  sequentially  until  a  decision  is 
available.  If  the  parameter  to  be  minimized  is  the  average  number  of  observations 
(samples),  then  Wald  [3]  showed  that  the  sequential  probability  ratio  test  (SPRT) 
for  the  binary  case  (two  hypotheses)  requires  in  the  mean  the  least  number  of 
measurements.  The  SPRT,  being  superior  in  this  sense  to  the  classical  fixed  num¬ 
ber  of  observations  tests,  has  been  given  much  attention  in  the  last  three  decades. 
The  sequential  probability  ratio  test,  as  a  binary  hypotheses  test,  has  become  very 
important  in  the  field  of  radar  detection.  The  application  of  sequential  tests  to  the 
detection  problem  minimizes  the  average  detection  time. 

2.1  Observations  and  Associated  Probability  Distributions 

Consider  the  problem  of  classifying  a  set  of  n,  K -dimensional  vector  obser¬ 
vations,  =  {Jf l,  X2, . . . ,  Xn},  X1  €  CK  as  belonging  to  one  of  M  classes, 

where  each  class  corresponds  to  an  event,  u;,-  for  i  =  1  ,...,Af.  Let  pn(x/ u>,) 
denote  the  joint  conditional  density  function  of  the  n  random  Ar-dimensional  vec¬ 
tor  observations.  Let  P(u>i )  denote  the  a  priori  class  probability  for  the  event 
u>i,  i  =  1  ,...,M.  We  assume  that  class  u is  composed  of  Ns  subclasses,  corre¬ 
sponding  to  the  events  ■  ■  ■  ^i,N3  each  °f  which  represents  the  target  of 
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class  u>{  at  a  distinct  azimuth  angle,  and  let  p(x^/cj,  j)  denote  the  joint  conditional 
probability  density  function  of  the  ltfl  observation  given  subclass  o?t-j. 

The  a  posteriori  probability  of  the  event  (class)  is  given  as 

r>(  ,  /J\  -  P(ui)P(xl M  /«  ,  x 

P{uJi/x  )  —  .  (2-1) 

p(ar') 

where 

.  N* 

p(xl/uJi )  =  P(^i,j)p(x  /^i,j)  (2.2) 

i=i 

Assuming  the  M  classes  are  equally  probable,  then  for  Gaussian  distributed  ran¬ 
dom  observations  the  joint  conditional  density  function,  which  is  a  Gaussian  mix¬ 
ture  (for  n  observations),  is  expressed  as  follows: 

Pn(x/u>i )  =  J]  p(xl/ui )  (2.3) 

/=1 

fr  1  Y'  1  ...  Re2{xkl -siJ'k}  +  Im2{xkl -Sij'k}] 

l=\N*  j=\{2*)T(jK  2a 

Where  xj^  is  the  ktfl  frequency  component  of  the  measurement  vector,  and  j  ^ 
is  the  ktfl  frequency  component  of  the  prototype  corresponding  to  the  jtfl  subclass 
(azimuth  angle)  of  class  u>,\  The  joint  density  function,  p(xl)  in  (2.1)  is  given  by 
M 

p(x‘)  =  Y^PMpi^M  (2-5) 

i 

Let  the  conditional  probability  of  deciding  hypothesis  u>i  when  the  unknown 
target  is  a  member  of  class  uj  be  e(t,  j).  Thus,  e(i,  j)  i  ^  j  is  the  probability  of 
misclassifying  a  target  from  class  u>j,  while  e(j,j)  is  the  probability  of  correctly 
classifying  that  target. 
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2.2  Binary  Case:  The  Sequential  Probability  Ratio  Test 


The  sequential  probability  ratio  test  (SPRT)  and  other  sequential  hypothesis 
tests  discussed  below  are  based  on  the  set  of  pairwise  likelihood  ratios,  i,  j(n) 
calculated  at  the  n**1  stage  of  the  test,  among  M  possible  hypotheses  given  as 


r  Pn(x/ui) 


(2.6) 


for  i,j  =  1, . . . ,  M. 

The  sequential  probability  ratio  test  due  to  Wald  [3]  for  testing  a  sample 
hypothesis  or  class  u\  against  a  single  alternative  u>2  (Af  =  2)  proceeds  as  follows: 


1.  Compute  the  likelihood  ratio  L\${ri)  based  on  n  observations, 

2.  If  Lipin)  >  A,  decide  uq 
If  L\t2(n)  <  B ,  decide  a>2 


Otherwise,  increment  the  number  of  measurements,  n,  and  repeat  the  test. 


In  this  test,  the  parameters  A  and  B  are  chosen  so  that: 

,  1  ~  e(2, 1) 

e(l,2) 

n_  «(2,l) 

1  -  e(l,2 )• 


(2.7) 

(2.8) 


This  test  is  optimal  in  the  sense  of  minimizing  the  probability  of  error  with  the 
fewest  average  number  of  required  measurements,  E{n)  [3]. 

Equations  (2.7)  and  (2.8)  define  the  decision  boundaries  which  partition  the 
feature  space  into  three  regions:  the  region  where  u>\  is  chosen;  the  region  where 
u>2  is  chosen;  and  the  region  of  indifference  (or  null  region).  The  null  region  is  the 
region  in  which  no  terminal  decision  is  made.  This  region  is  a  major  factor  in  de¬ 
termining  the  total  number  of  observations  and  the  probability  of  misclassification. 


As  the  null  region  becomes  larger,  the  test  becomes  longer  and  more  observations 
must  be  taken,  and  the  probability  of  error  is  reduced. 

2.3  Example 


Suppose  that  xj,  12*  •••,*»»  are  n  independent  measurements  each  with  prob¬ 
ability  density  function  p(x/u>{),  i  =  1,2,  a  univariate  Gaussian  with  mean  s,-  and 


variance  a 


The  likelihood  ratio  Li2(n)  at  the  stage  of  the  test  is  given  as: 


A„  =  log(i,2(n))=|]log(d^ll) 


(2.9) 


X„  =  +  32)1 

j  =  1 

The  decision  procedure  becomes  as  follows: 


(2.10) 


n 


j= 1 


a “  .  .  n,  . 

- log  A  +  -(si  +  s2) 

si  —  s2  2 


Ylxi  S  — - - logB  +  ^(si  +s2) 

/=!  S1  -  s2  2 


(2.11) 

(2.12) 


Otherwise  another  observation  is  required. 

The  distance  between  thresholds  that  define  the  null  region: 
a 2  a 

d=- - log-  (2.13) 

si  —  s2  B 

Notice  that  as  the  variance  <72  increases  or  as  the  class  means  sj  and  s2  approach 
each  other,  the  null  region  becomes  larger.  Thus,  more  measurements  are  required 
if  the  noise  level  increases,  or  if  the  targets  to  be  classified  are  similar  to  each  other. 
This  is  due  to  the  fact  that  as  the  null  region  becomes  larger,  more  observations 


axe  required  in  order  to  drive  the  test  out  of  this  region  to  either  of  the  terminal 
decision  regions. 

In  the  above  example,  notice  that  the  decision  regions  of  the  SPRT  were 
originally  fixed  by  the  thresholds  A  and  B.  However,  these  regions  depend  on 
the  statistical  parameters  of  the  random  observation  x.  At  high  noise  levels,  more 
observations  axe  required  before  a  terminal  decision  can  be  made. 

The  experimental  phase  of  this  study  deals  with  the  performance  of  a  radar 
target  identification  system  employing  sequential  techniques  where  the  noise  level 
affects  the  decision  boundaries  and  consequently  the  overall  error  probability.  If 
these  parameters  (variance  or  class  means)  axe  fixed  then  the  boundaries  of  the 
decision  areas  are  uniquely  defined  by  the  thresholds  A  and  B  and  hence  by  the 
error  probabilities  e(l,2),  and  e(2, 1). 

Prediction  of  the  number  of  observations  required  before  terminating  the  se¬ 
quential  test  might  give  an  idea  about  the  test  length.  In  [3]  an  expression  is  given 
for  the  expected  number  of  measurements  2?{n}  that  a  SPRT  requires  assuming 
a  Gaussian  distribution.  Predicting  the  expected  number  of  samples  in  the  bi¬ 
nary  hypotheses  tests  is  much  simpler  than  that  of  the  M-ary  hypotheses  tests. 
This  is  due  to  the  fact  that  the  probability  analysis  in  the  binary  case  is  not  as 
complicated  as  the  M-ary  case,  especially  in  defining  the  decision  regions  and  the 
corresponding  error  probabilities. 

2.4  Modified  Sequential  Probability  Ratio  Test  (MSPRT) 

The  SPRT  due  to  Wald  is  optimal  in  the  sense  of  minimizing  the  average 
number  of  observations  £{n}  with  fixed  error  probability.  It  is,  however,  possible 
for  this  test  to  require  an  unreasonable  number  of  measurements  before  reaching  a 


decision.  For  this  reason,  various  modifications  of  this  test  have  been  considered, 
including  an  abrupt  truncation  of  the  test  at  some  value  n  =  N. 
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As  an  alternative  to  an  abrupt  truncation,  a  modification  of  the  thresholds,  A 
and  B  is  suggested  in  [4].  Here,  the  decision  boundaries  incorporate  a  dependence 
on  n  so  that  the  new  thresholds  approach  a  common  level  at  n  =  N  (see  Figure 


2).  The  form  of  the  modified  thresholds  given  in  [4]  is 


A(n)  =  exp  c,(l  -  ^r)r 
B(n)  =  exp  cj(l  -  ^)r 


(2.14) 


(2.15) 


for  constants  c\ ,  C2  and  r.  Notice  that  as  the  maximum  number  of  measurements, 
N  increases,  this  test  reduces  to  Wald’s  test. 

In  [4]  it  is  shown  that  with  a  proper  choice  of  the  coefficients  ci,C2  and  r,  the 
number  of  measurements  can  be  limited  while  retaining  the  low  error  probability  of 
the  Wald  test.  The  relation  between  the  average  number  of  measurements  required 
for  the  Wald  test  J5ty{n}  and  the  average  number  of  measurements  required  for 
the  modified  test  Fq{n}  is  given  by 


£lW  =  TTr  <Ew{n) 
1  +  r 


(2.16) 


Thus,  the  modified  test  limits  the  maximum  number  of  measurements  and  reduces 
the  average  number  of  measurements  at  the  expense  of  a  slight  increase  in  the 
error  probability. 

Using  the  modification  discussed  above,  the  decision  procedure  in  the  previous 
example  becomes  as  follows:  if 


£  *>  a  irbr  ■  (■  -  £)" + j<si + ^  e  “i 


(2.17) 


also,  if 


numbf  o I  oto— rvmBorn 


Figure  2:  SPRT  and  MSPRT  decision  regions 
v~>  /  n  N*1  n . 

xi  -  -C2 -  — j  +2^1_M2)  i€u;2  (218) 

The  distance  between  thresholds  at  the  ni>l  stage  of  the  test  is 

=  (2.19) 

Notice  that  as  n  —*  N  the  separation  between  the  two  decision  boundaries  ap¬ 
proaches  zero.  At  n  =  N  the  region  associated  with  u\  and  the  region  associated 
with  u>2  meet,  eliminating  the  null  region,  and  the  test  is  terminated.  In  [4]  it  is 
shown  that  by  adjusting  the  starting  points  of  stopping  boundaries  it  is  possible 
to  achieve  error  probabilities  as  nearly  as  low  as  those  in  Wald’s  SPRT. 


2.5  Group  Sequential  Tests 


In  a  group  sequential  test,  observations  are  taken  in  groups  rather  than  single 
observation  at  a  time.  The  motivation  behind  this  approach  is  the  fact  that  the 
classical  SPRT  is  a  complex  procedure  that  requires  many  operations  at  each  stage 
of  the  test.  A  comparison  must  be  performed  after  each  observation,  a  feedback 
signal  is  required  to  request  an  additional  observation,  and  a  new  hypotheses  test 
is  required  at  each  stage  of  the  test.  The  sequential  observation  policy  suggested 
in  [12]  reduces  the  computation  time  of  the  SPRT  and  proceeds  as  follows: 

1.  At  each  stage  of  the  sequential  test,  a  test  statistic  is  calculated  based  on  Nq 
observations. 

2.  A  two-threshold  test  similar  to  the  Wald  test  is  performed.  No  more  obser¬ 
vations  are  requested  when  any  of  the  Wald  thresholds  is  crossed.  Otherwise, 
this  group  of  observations  is  discarded,  a  new  group  is  observed  and  the  test 
is  repeated. 

This  approach  is  called  a  “Memoryless  grouped-data  sequential  procedure”.  This 
technique  is  easier  to  implement  them  the  Wald  test,  but  is  not  optimal  in  any 
sense.  However,  this  approach  has  better  performance  than  tests  with  fixed  num¬ 
ber  of  observations  [12].  The  reduction  in  the  average  number  of  measurements 
using  this  approach  can  reach  60  percent  compared  to  tests  which  use  fixed  number 
of  observations.  The  SPRT,  on  the  other  hand,  reduces  the  number  of  measure¬ 
ments  up  to  72  percent  (see  [12]).  Truncation  of  the  above  test  is  simpler  than 
the  truncation  of  the  SPRT  because  the  number  of  observations  in  this  case  is 
geometrically  distributed  [12]. 


The  above  technique  is  useful  whenever  the  number  of  observations  required 
to  terminate  the  sequential  test  is  large.  This  might  not  be  the  case  in  radar 
target  identification  where  minimizing  the  number  of  observations  (test  length)  is 
the  main  concern. 

2.6  Relative  Efficiency  of  the  Sequential  Probability  Ratio  Test 


The  efficiency  of  the  sequential  probability  ratio  test  is  defined  as  the  ratio 
of  the  expected  number  of  measurements  in  the  sequential  test  to  the  number 
of  measurements  required  by  a  fixed  number  of  observations  test  to  achieve  the 
same  error  probability.  This  ratio  represents  the  performance  of  the  sequential 
test  in  classifying  targets  with  a  certain  error  probability  and  minimum  number  of 
observations. 

The  relative  efficiency  is  derived  for  the  binary  hypotheses  case  in  [13,4],  In 
[13],  the  efficiency  of  the  Wald  sequential  test  to  discriminate  between  the  two 
hypotheses,  uq,  and,  ui  is  given  as: 

e(2,l)log  )  +  (1  -  *(2,l))i°g  (pjjgn) 

’  (V-1(e(2,l))  +  ^-i((e(l,2)P)  (2'20) 

when  u>2  is  true,  and 

e(l,2)log  ( ‘ e^Af 1 )  +  t1  ~  e(l,2))iog :  ((i!fl$}i)) 


(2.21) 


when  u}\  is  true,  where  is  the  inverse  of  the  standard  normal  distribution 

</>(•).  For  derivation  of  the  above  equations  see  [13]. 

Notice  that  the  efficiency  of  the  SPRT  depends  on  the  error  probabilities 
chosen  to  define  the  decision  boundaries  of  the  sequential  test.  Since  no  estimates 
for  the  average  number  of  observations  required  by  M- ary  sequential  tests  exist,  the 


relative  efficiency  of  any  multiple  hypotheses  test  can  be  computed  experimentally 
only. 


CHAPTER  III 


M-ary  Hypothesis  Tests:  Parametric  Techniques 


This  chapter  is  concerned  with  parametric  M-ary  hypotheses  testing  tech¬ 
niques  where  M  is  the  number  of  possible  classes,  M  >  3.  The  joint  conditional 
density  function  pn(x/u)i)  of  the  n  random  K  -  dimensional  vector  observations 
is  assumed  to  be  known.  The  a  priori  class  probabilities  P(u>,)  for  the  event 
u>i  i  =  1, . . . ,  M,  are  also  known.  Thus,  the  case  of  a  random  observation  with 
known  statistical  parameters  is  considered  in  this  chapter. 

3.1  Bayes  Sequential  Test 

FYom  a  decision-theoretic  standpoint,  the  most  reliable  test  for  deciding  among 
M  hypotheses  is  the  Bayes  sequential  procedure  discussed  in  [4].  This  test  is 
optimal  in  that  it  minimizes  the  Bayes  risk  for  a  given  set  of  cost  functions  and 
prior  probabilities.  Unfortunately,  from  the  standpoint  of  implementation,  the 
complexity  of  this  test  is  a  major  concern.  At  each  stage  of  the  sequential  process 
it  is  necessary  to  find  the  expected  risk  of  making  a  decision,  as  well  as  the  risk 
of  continuing  the  test.  Dynamic  programming  is  used  to  implement  this  technique 

[4]- 

The  intuitive  argument  of  using  dynamic  programming  for  a  finite  sequential 
classification  problem  can  be  stated  as  follows:  With  observations  taken  one  at  a 
time,  each  stage  of  the  test  is  a  decision  problem  including  both  the  choice  of  taking 


additional  observation  or  terminating  the  sequential  test.  It  is  easy  to  determine 
the  expected  risk  involved  in  the  decision  when  the  test  is  terminated.  However  it 
is  difficult  to  compute  the  expected  risk  employed  in  taking  additional  observation 

[14]. 

A  new  measurement  is  requested  if  its  cost  is  less  than  the  cost  of  terminating 
the  sequential  test.  That  is,  observations  are  repeated  if  the  following  inequality 
is  satisfied. 

C(xi,...,xn)  +  J  Pn(xi,  •  •  • ,  xn+\)dP(xn+i/xi, . . . ,  xn) 

<  mini2(xi,. . .  ,Xn\di)  (3.1) 

where  J?(xi,  X2, . . . ,  xn;  d,)  is  the  average  risk  of  choosing  the  ith  class  after  talc¬ 
ing  n  measurements,  C(x\,  X2, . .  • ,  xn)  is  the  cost  of  these  n  observations  and 
Pn(x i, . . . ,  xn+i)  is  the  average  risk  of  the  (n  -f  l)th  observation.  While  it  is  possi¬ 
ble  to  implement  the  Bayes  procedure  using  dynamic  programming  techniques,  this 
test  has  found  limited  applications  because  of  the  required  complexity,  especially 
in  situations  where  rapid  decisions  are  desired. 

In  the  remainder  of  this  chapter,  we  consider  various  aspects  of  sequential  M- 
ary  tests  that  are  substantially  less  complex  than  the  Bayes  procedure.  Because 
of  the  relative  ease  of  implementation,  the  tests  discussed  below  are  candidate 
procedures  for  the  classification  of  radar  signals.  Unfortunately,  these  tests  produce 
higher  error  probabilities,  or  require  more  observations,  on  the  average,  than  the 
Bayes  sequential  procedure.  In  order  to  compare  the  performance  of  these  tests, 
it  is  necessary  to  evaluate  the  error  probability  and  average  number  of  required 
measurements  for  a  variety  of  cases.  The  performance  evaluation  of  these  tests  is 
the  subject  of  Chapter  V. 


3.2  Pairwise  Likelihood  Ratios:  The  Armitage  Test 


The  first  M- ary  technique  we  consider  is  due  to  Armitage  [6].  At  each  stage 
of  the  test,  this  approach  involves  the  comparison  of  all  M(M  —  l)/2  pairwise 
likelihood  ratios,  Ljj(n),  with  a  set  of  properly  chosen  thresholds,  Aij.  The 
Armitage  algorithm  is  restrictive  in  the  sense  that  all  M  —  1  likelihood  ratios  for 
hypothesis  u >,•  must  simultaneously  exceed  thier  respective  thresholds  in  order  for 
u to  be  selected. 

This  algorithm  is  summarized  as  follows: 

1.  Compute  Ljj(n),  ij  =  1,2, . . . ,  M,  i  ^  j 

2.  If  Lij(n)  >  Aij  Vj  =  1,2, ... ,  M,  j  i,  decide  class  u;,-. 

Otherwise,  increment  the  number  of  measurements,  n,  and  repeat  the  test. 


The  constants  A{j  could  conveniently  be  made  equal  to  a  fixed  threshold  A.  In  this 
case,  the  inequalities  specify  that  pattern  observation  continues  until  the  likelihood 
function  of  one  of  the  hypotheses  is  A  times  each  of  those  of  the  other  hypotheses. 
In  [6]  it  is  shown  that  the  probability  of  reaching  a  decision  approaches  one  as 
the  number  of  measurements  increases.  The  decision  probabilities,  e(t,i),  and  the 
thresholds,  Aij  for  this  test  are  related  by: 


«(*,«')  >  i  -  E 


(3.2) 


These  inequalities  show  that  the  probability  of  correct  decision  may  be  made  large 
if  the  thresholds  are  chosen  sufficiently  large.  Notice  that  for  M  =  2,  the  threshold 
is  identical  to  the  Wald  test  threshold.  However,  this  test  might  be  considered  as 


M{M  —  l)/2  binary  tests  where  each  of  these  binary  tests  has  one  decision  area 
defined  by  the  above  threshold.  Notice  that  for  M  >  3,  the  above  threshold  is 
lower  than  that  specified  by  Wald  for  the  binary  case.  Unfortunately,  there  is  also 
a  direct  relationship  between  the  size  of  the  thresholds  and  the  average  number  of 
required  measurements.  In  addition,  this  test,  like  the  Wald  test,  is  not  limited  to 
a  maximum  number  of  measurements. 

3.3  Modifications  of  the  Armitage  Thresholds 


In  order  to  develop  a  sequential  test  that  reduces  the  average  number  of  re¬ 
quired  measurements  while  retaining  the  simplicity  of  an  approach  based  on  a 
comparison  of  pairwise  likelihood  ratios,  we  consider  a  modification  of  the  Ar¬ 
mitage  technique  that  is  analogous  to  the  modification  of  the  Wald  test  discussed 
in  section  (2.4).  In  particular,  we  form  a  set  of  thresholds  that  depend  on  the 
number  of  measurements,  n  as: 

A'ij(n)  =  i  =  1, 2, . . . ,  M  i  #  j  (3.4) 

where  Ajj  are  the  original  thresholds  defined  by  Armitage  and  r  is  a  constant. 
Notice  that  if  r  =  0,  the  thresholds  correspond  to  the  Armitage  thresholds.  Also, 
notice  that  the  above  modification  does  not  place  a  limit  on  the  maximum  number 
of  measurements  required.  However,  as  shown  in  chapter  V,  the  thresholds  for  the 
case  where  r  =  1  significantly  reduces  the  average  number  of  required  measure¬ 
ments,  ^{n}  while  having  little  effect  on  the  error  probability  of  the  classifier. 

The  motivation  behind  such  a  modification  is  that  this  technique  gives  very 
good  results  at  low  noise  levels.  Thus,  it  is  sometimes  possible  to  reduce  the  number 
of  observations,  provided  that  any  slight  increase  in  the  probability  of  error  does 
not  change  the  performance  of  this  technique  as  a  whole.  However  if  r  >  3  then 


the  change  in  the  error  probability  becomes  significant.  This  is  an  expected  result 
because,  as  r  becomes  significantly  large,  the  null  region  is  reduced  at  the  expense 
of  larger  decision  areas.  Thus,  terminating  the  sequential  test  becomes  more  likely. 

The  disadvantage  of  the  above  mentioned  modification  is  that  it  does  not 
guarantee  a  termination  of  the  sequential  test  within  a  reasonable  number  of  mea¬ 
surements. 

3.4  Geometric  Mean  Comparison:  The  Reed  Test 

The  second  type  of  algorithm  we  consider  is  based  on  a  comparison  of  the 
individual  likelihood  functions,  pn(r/uq)  for  each  class,  to  the  geometric  mean  of 
the  M  likelihood  functions.  This  test,  due  to  Reed  [7],  is  characterized  by  a  decision 
occuring  on  the  basis  of  class  rejection  rather  than  class  acceptence.  That  is,  while 
the  tests  discussed  above  formulate  a  decision  on  the  basis  of  the  “most  likely" 
class  hypothesis,  the  approach  used  in  the  Reed  test  is  a  sequential  elimination  of 
the  “least  likely”  class  hypotheses  until  a  single  remaining  hypothesis  is  chosen. 

Since  this  method  compares  each  of  the  likelihood  functions  to  a  common 
geometric  mean,  only  M  ratios  must  be  computed;  a  significant  reduction  from  the 
M(M  —  l)/2  pairwise  likelihood  ratios  required  for  the  Armitage  test.  In  addition, 
as  the  least  likely  alternative  classes  are  eliminated,  even  fewer  computations  are 
required  as  the  test  progresses  until  a  single  likelihood  ratio  is  used  to  terminate 
the  test.  The  implementation  of  the  Reed  approach  may  be  summarized  as  in  [4]: 

1.  For  each  of  the  Mr  remaining  candidate  classes  (initially,  Mr  =  M)  9,  t 
{^l,  -  •  •  ,u>a /},»'  =  1,.  •• , Mr  compute 


2.  For  z  =  l,2,...,  Mu,  reject  class  $,•  if  £/n(jr/0,)  <  A,-. 


3.  If  just  one  class  6 ,  remains  decide  hypothesis  i, 

Otherwise,  increment  the  number  of  measurements,  n  and  repeat  the  test 
based  on  the  remaining  classes. 


In  the  Reed  algorithm,  the  thresholds,  A,-  are  independent  of  the  number  of 
measurements,  n  and  axe  related  to  the  decision  probabilities  as: 

1  -  e(i, i) 


Ai  = 


which  suggests  that  this  test  is  characterized  by  the  some  of  the  same  disadvantages 
as  the  original  forms  of  the  Wald  and  Armitage  tests.  In  [4],  Fu  suggested  the 
application  of  n-dependent  thresholds,  A,(n)  given  by 


1 1/A fR 


(3.6) 


A,(n) 


'‘i'-V 


(3.7) 


for  constant  r,  where  N  is  the  prespecified  maximum  number  of  allowed  measure¬ 
ments. 

Notice  that  the  ratio  Un(x/9i)  can  be  considered  as  the  Mth  root  of  the 
product  of  M  likelihood  ratios  £,  j  j  =  1, . . . ,  M  where  j  is  the  likelihood  ratio 
for  the  binary  case. 

The  Reed  algorithm  requires  the  computation  of  the  geometric  mean  of  Mr 
hypotheses.  Computational  problems  may  result  since  this  product  is  near  zero 
whenever  one  of  the  hypotheses  is  unlikely.  In  the  Armitage  algorithm,  such  un¬ 
likely  hypothesis  would  affect  only  one  likelihood  ratio  provided  that  M  —  1  likeli¬ 
hood  functions  are  considered  before  making  a  decision.  However  in  the  Reed  test, 
the  unlikely  hypothesis  affects  the  entire  test.  Thus,  whenever  unlikely  hypotheses 
exist,  the  decision  criteria  faces  a  numeric  problem. 
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3.5  Modifications  of  the  Reed  Algorithm 


A  suggested  modification  of  the  Reed  test  involves  changing  the  likelihood 
function  Un(x/9i)  given  above  into  the  following  form: 

Pn(x/9j) 


U'n(x/6i)  = 


*.(*/*>) 


(3.8) 


where  0,  G  {uq, . . .  Thus,  instead  of  computing  the  geometric  mean  at  each 

stage  of  the  test,  we  compute  the  arithmetic  mean.  By  using  such  a  likelihood 
function,  we  eliminate  the  computational  problems  discussed  above.  A  modifica¬ 
tion  in  the  threshold  directly  follows  the  modification  in  the  likelihood  function. 
The  new  thresholds  are: 

A'  =  1 

TfazfJta  -'0J» 


Notice  that  If  the  error  probabilities  e(i,j)  are  equal,  then  the  thresholds  A,-  and 
A\  are  equal. 

Using  this  modification  a  class  0,  is  rejected  if: 


T  I  Mr 

Pn(x/9i)  <  A ■  —  Pn(x/6j) 

while,  in  the  Reed  algorithm,  a  class  9f  is  rejected  if: 


(3.10) 


Pn(x/9i)  <  Ai 


Mr 

II  Pn(x/9j) 


U=1 


(3.11) 


The  geometric  mean  being  less  than  the  arithmetic  mean  implies  that  the  decision 
region  for  the  modified  test  is  larger  than  that  of  the  Reed  test.  Thus,  less  ob¬ 
servations  are  required  to  terminate  this  test.  At  low  noise  power  levels  the  joint 
probability  density  functions  pn(x/$i)  i  =  1, . . . ,  M  are  cosiderabiy  different  from 
each  other,  thus,  their  geometric  mean  is  much  less  than  their  arithmetic  mean. 


I 

\ 

which  means  that  the  rejection  region  for  the  modified  threshold  is  much  larger  j 

than  that  defined  by  the  Reed  test.  Thus,  at  low  noise  power  levels,  the  average  , 

number  of  observations  is  reduced  significantly  by  using  the  above  modification. 

The  application  of  n  dependent  thresholds  suggested  by  Fu  [4]  is  still  valid  in  j 

this  case. 

3.5.1  Double  Thresholds:  The  Reed  Test 


The  standard  Reed  test  employs  only  one  decision  region.  However,  at  high 
noise  power  levels,  rejecting  a  hypothesis  is  not  usually  done  before  taking  many 
observations.  Thus,  a  suggested  modification  to  the  Reed  algorithm  is  to  add 
another  threshold  5,  to  define  an  acceptance  region,  that  is:  a  hypothesis  6{  is 
accepted  if  f7n(x/0,)  >  B{,  €  {ui\, . . .  This  modification  improves  the 

performance  of  the  Reed  test  at  high  noise  power  levels.  The  threshold  B{  is  given 


as: 


Bi  = 


i  -£Si,M, •«(*■■) 

j£l  *-.*)]* 


(3.12) 


also,  for  arithmetic  mean  comparison,  the  threshold  B[  is  given  as: 
B'  = 


„  i-s 

Ej= 3  e(^j) 


(3.13) 


Using  this  modification  a  decision  can  be  reached  even  before  rejecting  any 
class.  Thus,  this  modification  reduces  the  hypotheses  testing  time  in  addition  to 
reducing  the  average  number  of  measurements  at  high  noise  power  levels. 


3.6  Single  Likelihood  Ratio:  The  Palmer  Test. 


Classes  of  sequential  tests  that  are  not  direct  extensions  of  the  Wald  test  to  M 
hypotheses  are  sequential  rank  tests  and  tests  where  each  test  statistic  is  compared 


to  multiple  thresholds.  Examples  of  tests  with  both  of  these  characteristics  are 
discussed  in  [15]  and  [16]. 


In  [16],  Palmer  proposes  a  method  based  on  the  computation  of  the  M  likeli¬ 
hood  functions,  p„(x/o>,-)  at  each  stage  of  the  test.  The  decisions  for  this  test  are 
made  on  the  basis  of  the  value  of  the  ratio  of  the  two  largest  likelihood  functions. 
This  single  likelihood  ratio  is  compared  to  a  threshold,  A.  The  implementation  of 
this  test  is  summarized  as: 

1.  Compute  pn(x/ui)  V  *  =  1, . . . ,  M . 

2.  Compute  the  likelihood  ratio  Lij(n)  of  the  largest  and  second  largest  likeli¬ 
hood  functions. 


3.  If  L{j(n)  >  A,  decide  a 

Otherwise,  increment  the  number  of  measurements,  n,  and  repeat  the  test. 


In  the  Palmer  test,  the  threshold  A(  is  given  as: 

M 


Ai  — 


4(l-e(i,i))2 


(3.14) 


This  test  gives  small  error  probability  at  low  noise  power  levels  with  small  number 
of  observations.  However,  at  high  noise  power  levels,  the  error  probability  becomes 
significantly  large  because  the  difference  in  the  likelihood  functions  becomes  very 
small,  and  a  decision  based  on  the  two  largest  likelihood  functions  is  not  reliable. 
Finally,  we  point  out  that  this  test  may  also  be  modified  to  include  an  n-dependent 
threshold  to  allow  a  non-abrupt  truncation  while  limiting  the  maximum  number 
of  measurements. 


3.7  Sequential  Maximum  A  Posteriori  Test 

The  maximum  a  posteriori  probability  (MAP)  technique  chooses  the  hypoth¬ 
esis  whose  a  posteriori  probability  is  maximum.  Thus,  the  decision  is  direct 
(nonsequential)  and  the  target  membership  is  assigned  to  the  most  likely  hypothe¬ 
sis.  Assuming  equiprobable  classes  and  equal  cost  functions,  the  joint  conditional 
density  functions  of  the  n  random  A'-dimensional  vector  observations  are  enough 
to  decide  one  of  the  M  possible  hypotheses.  That  is,  the  hypothesis  with  largest 
likelihood  function  pn(x/u>i)  i  =  1  is  determined  to  be  the  class  of  the 

unclassified  target.  In  this  method,  no  thresholds  are  required  and  the  decision  is 
a  one-shot  procedure. 

Assume  n  observations  are  taken  and  the  sample  average  X  =  ^  H”=i  Arj, 
then  the  expected  value  of  X ,  E{X}  =  A{X};  however  the  variance  of  the  sample 
average,  t’ar{JV}  =  Var(.-yi,  Thus,  as  the  number  of  samples,  n,  increases,  the 
variance  of  the  random  observation  X  decreases  by  a  factor  of  n.  This  feature 
motivates  the  following  algorithm: 

1.  Compute  pn(x/ujj)  j  =  1, 2, . . . ,  M 

2.  Compute  pn(x/uq)  =  [pn(x/u;;)] 

3.  If  pn(x/u)i)  >  Ai  Decide  class  u>,-, 

Otherwise  increment  the  number  of  measurements,  n,  and  repeat  the  test. 

The  test  is  truncated  after  observing  a  maximum  number  of  samples  N  by  choosing 
the  class  with  maximum  a  posteriori  probability.  This  technique  gives  reasonable 
results  as  shown  in  Chapter  V. 


In  [17],  stopping  rules  for  this  test  axe  derived  from  those  of  the  Armitage 
test.  These  stopping  rules  axe  applied  to  the  likelihood  functions  rather  than  the 


likelihood  ratios.  Thus,  equation  (3.3)  can  be  modified  as: 


or 


Lj=i 


PnjX/uj)  >  _ 1 

P(uj)pn(x/ujj)  ~  P(W|.)  +  ££1jV. 


(3.15) 


P(»i/x)  > 


PM 


PM  + 


(3.16) 


This  test  can  be  considered  as  a  comparison  of  the  likelihood  function  of  a  class  u 
to  the  arithmatic  mean  of  all  hkelihood  functions.  Thus,  it  requires  the  computa¬ 
tion  of  M  likelihood  ratios,  while  M(M  —  l)/2  likelihood  ratios  are  required  by  the 
Armitage  test.  However,  this  test  is  less  restrictive  than  the  Armitage  test  since 
it  does  not  employ  pairwise  comparison.  Thus,  the  performance  of  this  algorithm 
is,  at  the  most,  as  good  as  that  of  the  Armitage  test. 

This  test  can  be  modified  to  include  hypotheses  rejection  (using  (3.10)),  where 
a  class  is  rejected  if: 


Pnjx/uj)  <  1  ~  e(i,i) 

"  z"?(l  -e(U)) 
or 


(3.17) 


PM*)  ^ 


1  -  e(i,i) 

xfc  £,"?(!  -  =(•'.») 


(3.18) 


Moreover,  this  test  can  be  considered  as  Bayes  sequential  test  with  suboptimal 
decision  boundaries. 


3.8  SPRT  Applied  to  Tree  Structured  M-ary  Hypothesis  Tests 


Algorithms  for  computing  the  optimal  decision  boundaries  and  predicting  the 
average  number  of  observations,  £?{n},  exist  for  various  versions  of  sequential 
binary  hypotheses  tests.  In  this  section  the  M-ary  problem  is  treated  as  a  sequence 
of  binary  hypotheses  comparisons.  The  test  is  composed  of  many  levels  where  a 
binary  classification  test  is  applied  at  each  level.  Thus,  the  M- ary  test  is  reduced 
to  a  set  of  binary  tests. 

The  radar  target  identification  problem  that  requires  choosing  one  hypothesis 
among  M  possible  hypotheses  can  be  solved  by  dividing  the  M  classes  into  two 
seperate  groups  of  classes,  each  including  a  certain  set  of  classes  that  have  similar 
features.  Each  of  these  groups  is  then  divided  into  two  subgroups  and  so  on. 
Dividing  groups  into  subgroups  continues  until  two  classes  only  are  contained  in 
each  group.  The  Wald  SPRT  is  then  applied  to  various  levels  of  the  test.  Thus, 
choosing  among  M  hypotheses  is  reduced  to  a  multilevel  binary  test.  At  each  level 
of  the  test,  a  decision  to  reject  a  certain  group  or  to  repeat  observations  is  made. 

In  order  to  make  sure  that  the  test  terminates  with  a  finite  number  of  obser¬ 
vations,  truncation  must  be  employed  in  the  sequential  decision  procedure.  Trun¬ 
cation  can  take  place  in  two  forms.  The  first  form  involves  truncation  once  in  the 
entire  test  (see  Figure  3).  If  all  the  allowed  measurements  are  requested  before 
reaching  the  lowest  level  of  the  test  i.e.,  before  deciding  between  two  classes,  then 
the  test  continues  by  eliminating  the  null  region  for  the  rest  of  the  binary  tests. 
Thus,  after  truncation,  the  rest  of  the  binary  tests  are  considered  as  likelihood 
ratio  tests. 

The  second  form  of  termination  employs  truncation  at  various  levels  of  the 
M-ary  test.  Thus,  new  decision  regions  are  formed  whenever  a  new  level  is  reached 


(see  Figure  4).  Truncation  at  each  stage  should  be  done  in  such  a  way  that  the 
total  number  of  observations  requested  before  the  last  truncation  is  equal  to  the 
maximum  allowed  number  of  observations.  Thus,  if  the  Af-ary  test  consists  of  L 
levels  and  the  maximum  number  of  observations  at  each  level  i  =  1,2, . . . ,  L,  is 
Ar,  then  N  =  HjLj  Nj  is  the  maximum  number  of  allowed  measurements  for  the 
entire  test. 

The  number  of  levels  depends  on  the  number  of  classes  M,  as  well  as  on  the 
way  groups  are  selected  and  the  number  of  classes  m  <  M ,  in  each  group.  Groups 
may  contain  one  or  more  classes  depending  on  the  way  these  classes  are  separated. 
A  group  that  contains  one  class  needs  no  further  testing  once  the  test  reaches  its 
level. 

In  this  study,  grouping  is  performed  according  to  the  physical  similarities 
between  aircraft.  That  is,  grouping  is  based  on  physical  aspects  such  as  location 
of  the  engine,  shape  of  the  tail,  size  of  the  aircraft  etc...  Let  6\, . . .  ,6m  be  the 
classes  contained  in  a  group  0,-,  where  0j  G  {uq,  . . .  j  =  1, . . . ,  m.  Also,  let 

pn{xl /0,)  be  the  joint  conditional  density  function  of  the  Ith  observation.  Then 


f-we  ,•)  =  E  P(«Mx/«j) 

>= i 

If  n  samples  are  observed  then: 

n  m 

P(x',x2 . x"/6()=  n  E  **'/«;  )e(«i ) 

/=1  j=l 

where  p(Xl /#,)  is  given  by: 


(3.19) 


(3.20) 


1  [exp-  E{=i  &2(4  ~  +  S2(4  -  si  kJ) 


This  test  is  not  optimal  because  applying  an  optimal  binary  test  (SPRT)  at  each 
level  does  not  mean  that  the  entire  decision  procedure  is  optimal.  However,  the 
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Figure  3:  Decision  regions  dependent  on  the  number  of  observations 


;ure  4:  Decision  regions  dependent  on  the  number  of  observations  r 
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average  number  of  observations  of  the  Af-ary  test  which  is  equal  to  the  summation 
of  the  average  number  of  observations  requested  by  the  Wald  test  at  each  level,  is 
predictable. 

An  optimal  choice  of  groups  would  definitely  improve  the  performance  of  the 
above  algorithm.  Moreover,  the  tree  algorithm  can  be  considered  as  a  combination 
of  both  Armitage  and  Reed  tests  since  it  includes  a  pairwise  comparison  among 
groups  and  employs  group  rejection  at  each  level  of  the  test. 
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CHAPTER  IV 


M-ary  Hypothesis  Tests:  Nonparametric  Methods 
4.1  Nonparametric  Techniques 

A  common  aspect  of  the  algorithms  discussed  in  the  previous  section  is  the 
dependence  of  the  decision  statistics  on  the  likelihood  functions,  pn(x/u.'i).  Un¬ 
fortunately,  a  useful  statistical  characterization  of  a  radar  system  is  not  always 
available  to  the  system  designer.  While  statistics  for  the  observed  patterns  can 
be  estimated  through  learning  processes,  it  may  be  undesirable  to  employ  a  para¬ 
metric  technique  designed  for  a  particular  system  in  hopes  that  the  algorithm  is 
robust  with  respect  to  mismatched  system  characteristics.  In  this  case,  it  is  often 
necessary  to  employ  some  form  of  nonparametric  classification  technique.  Thus, 
nonparametric  classifiers  are  used  for  situations  where  the  probability  distribution 
functions  of  the  hypotheses  cannot  be  parametrized  by  a  set  of  finite  parameters. 
Nonparametric  classifiers  usually  achieve  good  performance  over  a  large  class  of 
distribution  functions. 

Fixed  sample  size  nonparametric  recognition  systems  have  been  treated  ex¬ 
tensively  in  the  literature  [11].  However,  few  results  exist  for  the  sequential  form 
of  nonparametric  techniques.  Most  of  the  nonparametric  classification  techniques 
discussed  in  the  literature  are  based  on  some  form  of  rank  tests  [4],  or  on  a  calcula¬ 
tion  of  the  “distance”  or  least-mean-square  difference  between  the  observed  signal 
vector.  A'  and  the  set  of  catalog  prototypes,  5,  j,  ?  =  1, . . . ,  A/,  j  —  1, .  ■  • .  -V5  for 


the  Nt  prototypes  of  each  of  the  A/  classes  [8].  An  example  of  a  nonparametric 
technique  based  on  a  comparison  of  distances  is  the  nearest  neighbor  (NN)  algo¬ 
rithm  which  is  employed  in  various  pattern  recognition  applications  [18,19]  and  is 
implemented  in  a  sequential  scheme  later  in  this  chapter. 

4.2  Linear  Sequential  Pattern  Classification 

The  linear  sequential  pattern  classifier  approach  combines  the  sequential  na¬ 
ture  of  classifiers  based  on  sequential  decision  theory  with  the  linear  structure  of 
a  linear  classifier  [8].  In  this  algorithm,  decisions  to  repeat  an  observation  or  to 
classify  the  pattern  are  made  using  linear  functions  derived  from  a  set  of  sample 
patterns  by  the  least  mean-square  error  criterion.  The  decision  procedure  of  this 
classifier,  for  a  pattern  „Y,  is  to  measure  the  components  x\,  x%, . . . ,  x^-,  and  clas¬ 
sify  X  to  class  i  if  its  image  ( X  x  Wn)  lies  closest  to  a  reference  point  ;  that  is, 
if 

II*  x  Wn  -  bi\\  =  i  mmM  ||JY  xWn-  6j||  (4.1) 

Where  the  transformation  matrix  Wn  can  be  expressed  in  terms  of  the  sample 
pattern  matrix  S  and  the  reference  point  matrix  R.  At  the  nth  stage  of  the  test 
Wn  =  S+R.  Where  S+  is  the  generalized  inverse  of  the  matrix  5.  The  decision 
boundaries  employed  are  exactly  the  same  like  those  defined  by  Armitage  [6]  but 
mapped  into  a  decision  space  of  the  Unear  least  mean  square  classifier. 

The  concept  underlying  this  approach  is  very  simple,  however  the  implemen¬ 
tation  of  this  technique  requires  much  computation  at  each  stage  of  the  sequential 
test.  The  implementation  of  this  algorithm  is  complex  because  it  requires  the 
computation  of  the  generalized  inverse  of  the  pattern  matrix  5  whose  dimensions 
increase  whenever  a  new  observation  is  requested. 


In  [14]  a  sequential  classification  algorithm  using  nonparametric  ranking  is 
proposed.  This  technique  requires  ranking  of  vector  components  that  represent 
both  the  catalog  data  and  the  random  observation.  If  single  frequency  radar  is  used 
and  the  target  is  of  known  azimuth,  then  this  algorithm  can  be  applied  directly. 
However,  if  the  observation  vector  utilizes  more  than  one  frequency  component, 
then  ranking  is  according  to  the  norm  of  the  vector  of  observations,  thus  resulting 
in  a  loss  of  information.  If  the  azimuth  position  of  the  target  is  ambiguous  or 
known  only  to  be  within  a  certain  range  then  this  algorithm  is  not  useful. 

4.3  Sequential  Nearest  Neighbor  (SNN)  Techniques 

The  nearest  neighbor  technique  of  pattern  recognition  is  based  on  the  com¬ 
putation  of  the  vector  distance  between  the  observed  signal,  X  and  each  of  the 
class  prototype  vectors  Sjj  from  class  i,  and  subclass  j.  Specifically,  the  nearest 
neighbor  algorithm  decides  class  u ;;  if  t  =  arg min/{ HX  —  =  l,...,iVs}, 

where  ||  •  ||  denotes  the  Euclidean  distance  for  the  A'-dimensional  complex  vectors. 

This  technique  may  be  implemented  as  a  part  of  a  sequential  classification 
procedure  as: 

1.  Compute  the  average,  X  =  ^  Xi  of  the  n  available  observations, 

2.  Compute  dmin  =  min/dlA"  -  =  1  ,...,Na}, 

K 

dmm  =  min  £  %2{xlk  -  sijk)  +  92(4  -  atj<k)  (4.2) 

1  k= 1 

3.  Decide  class  u;,  if  i  =  argmin/{||^V  —  S/  j||;  j  =  1, . . . ,  Ns}  and  if  dmin  <  .4,. 

Otherwise,  increment  the  number  of  observations,  n,  and  repeat  the  test. 

We  point  out  that  the  process  of  averaging  observations  in  the  first  step  of  the 
test  may  be  viewed  as  a  means  to  enhance  the  signal-to-noise  ratio  (SNR)  of  the 
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Figure  5:  Decision  regions  for  the  SNN  technique 

decision  statistic.  Indeed,  for  coherent  averaging  and  independent  noise,  the  SNR 
at  the  ntfi  stage  of  averaging  is  increased  by  a  factor  of  lOlogio(n)  (dB)  which 
implies  a  reduction  by  a  factor  of  n  in  the  expected  value  of  the  minimum  distance 
squared,  c/m[n  for  the  test.  On  the  other  hand,  the  enhancement  of  the  SNR  by 
averaging  additional  observations  is  directly  related  to  the  enhancement  of  the 
decision  statistics  based  on  likelihood  functions  for  parametric  tests,  especially  in 
the  case  of  additive  Gaussian  noise. 

In  a  fashion  similar  to  that  employed  for  the  parametric  tests,  we  may  modify 
the  thresholds  in  the  sequential  nearest  neighbor  test  to  limit  the  maximum  number 
of  measurements  and  reduce  the  average  number  of  measurements  while  maintain¬ 
ing  low  error  probabilities.  In  particular,  we  choose  the  measurement-dependent 
thresholds  as: 


At(n)  =  Ai{l  +  -)  (4.3) 

for  constant  r.  Figure  5  shows  the  decision  boundaries  for  this  algorithm.  Notice 
that  the  null  region  is  above  the  threshold  in  contrast  to  the  techniques  discussed 
in  the  previous  chapters.  The  reason  for  this  choice  of  thresholds  is  that  the 
sequential  test  becomes  less  restrictive  as  the  number  of  observations  increases. 

As  shown  in  chapter  V,  by  increasing  the  value  of  the  thresholds  .4,,  the 
classification  time  decreases  with  a  slight  increase  in  the  probability  of  error.  Thus, 
the  performance  of  this  algorithm,  like  the  other  parametric  techniques,  depends 
on  the  area  of  the  null  region,  as  well  as  the  way  this  region  varies  with  the  number 
of  observations. 

The  results  of  simulation  studies  of  the  performance  of  this  nonparamctric 
technique  show  that  under  certain  circumstances,  the  sequential  nearest  neighbor 
test  performance  is  comparable  to  that  of  the  parametric  techniques  discussed 
above.  In  addition,  the  sequential  nearest  neighbor  test  requires  only  the  compu¬ 
tation  of  M  ■  Na  vector  distances  at  each  stage  of  the  test. 

The  choice  of  the  thresholds  .4,  depends  on  the  data  available  or  the  catalogue 
of  information  used  for  comparison.  As  a  nonparametric  approach,  the  nearest 
neighbor  decision  rule  requires  decision  boundaries  that  are  not  directly  related 
to  error  probabilities  e(i,j).  The  threshold  A,-  must  be  chosen  such  that  the 
sequential  test  is  not  terminated  before  enough  observations  are  requested.  That 
is.  no  decision  should  be  declared  unless  the  tested  target  is  closer  to  any  prototype 
in  its  class  than  any  other  prototypes  in  the  other  classes.  Thus,  information 
concerning  the  noise  power  level  will  help  in  choosing  optimal  (or  subopt imal) 
stopping  boundaries. 

An  important  feature  of  the  nearest  neighbor  test  as  a  nonparametric  classi- 


fication  algorithm  is  that  it  does  not  require  noise  free  catalogue  data,  that  is  the 
class  prototype  vectors  can  be  noisy. 


The  above  mentioned  algorithm  can  be  applied  to  any  version  of  the  nearest 
neighbor  decision  rule.  For  example,  similar  algorithms  can  be  applied  to  the 
nearest  neighbor  with  reject  option.  In  the  nearest  neighbor  with  reject  option 
and  for  every  pair  of  integers  ( k,l )  with  k/2  <  l  <  k  the  k  nearest  neighbors  of 
an  observation  are  examined,  and  if  l  or  more  of  them  are  in  the  same  class  the 
observation  is  assigned  to  this  class,  otherwise  it  is  rejected. 

A  sequential  version  of  the  “nearest  class  mean  classifier"  can  also  be  derived 
in  a  similar  manner  as  the  sequential  nearest  neighbor  algorithm.  The  nearest 
class  mean  classifier  is  a  classification  algorithm  based  on  choosing  the  class  whose 
prototypes  are  closer  in  average  to  the  tested  target  than  the  prototypes  of  all  other 
classes.  In  [10]  it  is  shown  that  the  minimum  distance  classifier  gives  better  results 
than  the  nearest  neighbor  method  at  high  noise  power  levels.  Thus,  an  efficient 
sequential  classification  algorithm  may  use  the  sequential  version  of  the  ne;irest 
neighbor  at  low  noise  power  level  and  a  sequential  minimum  distance  classifier  at 
high  noise  power  levels.  To  be  able  to  use  such  an  algorithm,  a  priori  knowledge 
of  the  noise  power  level  should  be  available  in  advance. 


CHAPTER  V 


Simulation  Results 

In  this  chapter,  we  present  the  results  of  Monte-Carlo  simulation  studies  of 
the  performance  of  the  various  sequential  tests  discussed  in  the  previous  chapters. 
The  percentage  of  classification  error  and  the  average  number  of  measurements, 
are  given  for  a  number  of  combinations  of;  the  total  number  of  classes  M . 
the  number  of  subclasses  Ars,  and  the  dimension,  K  of  the  vector  of  observations. 
A'.  The  maximum  number  of  measurements  is  set  at  a  nominal  value  of  N  =  10. 

In  each  case,  the  goal  is  to  classify  an  observation  of  an  unknown  radar  signal 
as  being  produced  by  one  of  a  set  of  up  to  five  different  commercial  aircraft,  each 
represented  by  a  class  containing  prototypes  representing  vector  observations  of 
the  particular  aircraft  at  up  to  nineteen  different  azimuth  angles  ranging  from  0° 
to  180°. 

5.1  Database 

The  database  consists  of  coherent  radar  backscatter  measurements  of  scale- 
model  of  five  commercial  aircraft,  obtained  from  The  Ohio  State  University  Elec- 
troScience  Laboratory  compact  range.  The  compact  range  data  have  been  normal¬ 
ized  by  removing  all  system  related  parameters  from  the  measurements.  Scaled 
data  are  available  for  each  aircraft  at  0°  elevation  angle,  and  azimuth  positions  at 
0°.  10°,  20° .  1S0°.  The  dimension  of  the  vector  of  observations  K  range 
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from  A'  =  1,  to,  A'  =  51,  covering  a  frequency  range  8-58  MHZ  using  horizontally 
transmitted,  horizontally  received  polarization  (HHP).  Each  measurement  compo¬ 
nent  is  a  complex  number  whose  phase  and  amplitude  are  known  (coherent  radar 
backscatter).  (See  [1]  for  a  discussion  of  the  generation  and  characteristics  of  the 
aircraft  catalog  database.) 

5.2  Measurements  and  Noise  Model 


T 

The  Ith  observation  vector  X ^  =  [x [ ,  x%, . . . ,  x^-J  ,  whose  dimension  K  cor¬ 
responding  to  the  number  of  frequencies  used,  represents  the  complex  normalized 
scattering  coefficient  of  a  prototype  from  an  unknown  class.  The  complex  number 
corresponds  to  the  complex  scattering  coefficient  whose  magnitude  is  the  square 
root  of  the  measured  cross  section  in  square  meters,  m2,  and  whose  phase  is  that 
of  the  measured  signal. 

For  the  simulation  experiments  of  both  parametric  and  nonparametric  sequen¬ 
tial  techniques,  it  is  assumed  that  the  observation  process  corresponds  to  a  linear 
system  measurement  of  the  signal  vector,  5,  in  the  presence  of  additive  Gaussian 
noise.  The  signal  vector  is  taken  as  one  of  the  Ns  prototypes  from  one  of  the  M 
classes,  where  each  class  corresponds  to  one  of  the  five  aircraft  in  the  database. 

The  additive  Gaussian  noise  is  represented  by  two  uncorrelated  random  num¬ 
bers  IT#,  and  Wj,  each  having  a  Gaussian  distribution  with  zero  mean  and  variance 
tj-.  Thus  the  total  additive  noise  W  =  Wft  +  jWj  ( j  =  \/—l)  has  a  Gaussian  dis¬ 
tribution  with  zero  mean  and  variance  cr2.  The  ltfl  observed  complex  normalized 
scattering  coefficient  of  a  target  of  class  i  and  at  j ^  azimuth  position  and  using 
ktfl  frequency  component  is: 


4  -  [-Re(-siJ,*)  +  ^  ft  +  J 


(5.1  : 
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Table  1:  Error  rate  and  average  number  of  measurements  for  the  Reed  technique 
with  M  =  5  classes,  Ns  —  6  prototypes/class,  and  K  =  4. 


Noise  (dBsm) 

Error 

£{n} 

16 

0.000 

2.14 

19 

0.004 

2.44 

22 

0.044 

3.29 

25 

0.136 

4.57 

28 

0.288 

6.86 

31 

0.412 

8.90 

5.3  Simulation  Approach 

When  the  azimuth  angle  of  the  radar  object  is  assumed  to  be  known,  then 
A's  =  1  and  the  resulting  vector  observation,  X  =  5  + IT  is  assumed  to  be  Gaussian 
random  vector  with  mean  vector  S  and  covariance  matrix,  ct2/,  where  I  is  the 
I\  x  I\  identity  matrix.  When  the  azimuth  angle  of  the  object  is  assumed  to 
be  unknown,  or  known  to  within  a  specified  range,  the  observation  vector  A'  is 
assumed  to  be  distributed  as  a  Gaussian  mixture  as  in  (2.4),  corresponding  to 
the  Ns  subclasses  for  each  of  the  A/  classes.  The  target  is  assumed  to  stay  in 
the  same  azimuth  position  whenever  a  new  observation  is  requested,  that  is,  all 
measurements  correspond  to  one  target  at  a  fixed  position  in  azimuth  and  elevation. 

In  Figure  6,  the  probability  of  classification  error  is  shown  as  a  function  of 
noise  power  level  for  the  Reed  test  using  the  modified  form  of  the  thresholds  (3.7). 
The  average  number  of  required  measurements,  E{n }  for  this  test  as  a  function  of 
noise  power  level  is  given  in  Figure  7.  These  results  are  also  tabulated  in  Table  1. 

The  results  in  Figures  6-7  are  computed  for  Ns  =  6  prototypes  per  class, 
corresponding  to  target  azimuth  angles  of  0°,  10°,  20°.  30°,  40°,  and  50°.  In  this 
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6:  Error  rates  for  the  Reed  test  with  M  =  5  classes,  N, 
prototypes/class,  K  =  4. 
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case,  the  measurement  vectors  are  of  dimension,  K  =  4  corresponding  to  coherent 
backscatter  measurements  at  8,  11,  16,  and  25  MHz.  The  error  percentages  and 
the  average  number  of  measurements  are  based  on  the  results  of  three  hundred 
experiments. 

The  coordinates  of  the  abscissa  in  these  figures  refer  to  the  power,  or  variance, 
of  the  Gaussian  observation  in  terms  of  decibels  relative  to  the  power,  Pr  received 
from  an  ideal  radar  signal  reflector  with  one  square  meter  area,  i.e., 


Noise  Power  (dBm2)  =  10  •  logjQ  ^ 


For  the  experimental  results  presented  here,  the  average  signal  power  of  the  com¬ 
ponent  of  the  radar  measurement  due  to  the  target  of  interest  is  approximately 
20dBm2,  so  that  an  approximate  SNR  in  decibels  may  be  calculated  for  any  of  the 
data  presented  below  as  SNR  &  20  -  Noise  Power(dBm2).  Figures  8-9  show  the 
effect  of  the  suggested  modifications  to  the  Reed  test  on  both  the  probability  of 
error  and  average  number  of  measurements,  with  Ns  =  6,  I\  =  4. 

In  Figures  10-12  the  probability  of  classification  error  is  shown  as  a  function 
noise  power  level  for  the  Reed  test  with  Ns  =  5(4),  and  K  =  3(2)  respectively. 
The  average  number  of  required  measurements,  for  these  cases  are  shown  in  Fig¬ 
ures  11-13.  Figures  14-15  show  the  classification  error  and  average  number  of 
measurements  for  the  Reed  test  with  Ns  =  2  (0°,  10°)  and  K  =  2. 

Figures  16-17  show  the  probability  of  error  and  the  average  number  of  mea¬ 
surements  as  a  function  of  the  noise  power  level  for  the  Reed  test  using  a  single 
frequency  radar  with  Ns  =  1  (simple  classes).  Similar  results  for  the  Armitage 
and  Palmer  tests  are  shown  in  Figures  18-37.  These  results  are  also  tabulated  in 
Tables  2-3. 


Figure  8:  Comparison  of  error  rate  for  Reed  and  modified  Reed  techniques  with 
M  =  5  classes,  N8  =  6  prototypes/class,  K  —  4. 


Figure  9:  Comparison  of  the  average  number  of  measurements  for  the  Reed  test 
and  the  modified  Reed  with  M  =  5  classes,  Ns  =  6  prototypes/class,  K  =  4. 
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Figure  12:  Error  rates  for  the  Reed  test  with  M  —  5  classes,  Ns 

prototypes/class,  K  =  2. 
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Figure  14:  Error  rates  for  the  Reed  test  with  M  =  5  classes,  Ns  =  2 

prototypes/class,  K  =  2. 


Figure  15:  Average  number  of  measurements  for  the  Reed  test  with  M 
classes,  Ns  =  2  prototypes/class,  I\  —  2. 
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Figure  16:  Error  rates  for  the  Reed  test  with  M  =  5  classes,  Ns  —  1 

prototypes/class,  K  =  1. 


Table  2:  Error  rate  and  average  number  of  measurements  for  the  Armitage 
technique  with  M  =  5  classes,  Nt  =  6  prototypes/class,  and  iv  =  4. 


Noise  (dBsm) 

Error 

E{n) 

16 

0.000 

1.37 

19 

0.000 

2.28 

22 

0.006 

4.50 

25 

0.064 

7.18 

28 

0.234 

9.41 

31 

0.352 

9.96 
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Figure  17:  Average  number  of  measurements  for  the  Reed  test  with  M  =  5 
classes,  Ns  =  1  prototypes/class,  K  =  1. 
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Figure  18:  Error  rates  for  the  Armitage  test  with  M  =  5  classes,  Ns  =  G 

prototypes/class,  I\  =  4. 
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Figure  19: 


Figure  21:  Average  number  of  measurements  for  the  Armitage  test  with  M 
classes,  Ns  =  5  prototypes/class,  K  =  3. 
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3:  Error  rates  for  the  Palmer  test  with  M  =  5  classes,  Ns  =  6 
prototypes/class,  K  =  4. 
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Figure  30:  Error  rates  for  the  Palmer  test  with  M  =  5  classes,  N 

prototypes/class,  K  =  3. 


Figures  38-39,  show  the  effect  of  the  proposed  modification  (3.4)  to  the  Ar- 
mitage  procedure.  For  these  simulations,  target  prototypes  at  azimuth  angles  of 
0°,  10°,  20°,  30°,  and  40°  are  used  for  measurement  vectors  of  dimension  K  =  3 
frequencies  at  8,  9,  and  10  MHz.  These  results  are  also  tabulated  in  Tables  4-5-0. 
From  these  figures  it  is  clear  that  fewer  measurements  are  required  for  the  modi¬ 
fied  test  for  higher  values  of  the  parameter  r  while  the  resulting  error  percentages 
remain  almost  the  same  as  for  the  original  test  (r  =  0). 

Figures  40-41  show  the  reduction  in  the  average  number  of  obsevations  due  to 
the  modified  armitage  thresholds  (3.4)  where  the  target  prototypes  are  at  azimuth 
angles  of  0°,  10°,  20°,  30°.  The  measurement  vector  is  of  dimension  K  =  2 
frequencies  at  S  and  9  MHz. 

In  Figure  42,  the  probability  of  classification  error  is  shown  as  a  function  of 
noise  power  level  for  the  tree  structured  sequential  test.  The  average  number  of 
required  measurements,  E{n}  for  this  test  as  a  function  of  noise  power  level  is 
given  in  Figure  43.  This  sequential  test  starts  by  classifying  the  target  into  groups 
of  hypotheses  {uq,},  or  {uq , <^2, W3,  uq}  then,  if  it  is  not  of  class  {uq>},  the  target  is 
classified  as  a  member  of  the  groups  {uq,^}  or  {0^3, uq }.  Finally,  the  classification 


Table  3:  Error  rate  and  average  number  of  measurements  for  the  Palmer 
technique  with  M  =  5  classes,  Ns  —  6  prototypes/class,  and  K  =  4. 


Noise  (dBsm)  Error  I  E{n} 


16 

0.003 

1.01 

18 

0.020 

1.03 

22 

0.088 

1.20 

26 

0.235 

2.84 

28 

0.329 

4.51 

30 

0.448 

5.52 
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Figure  41:  Average  number  of  measurements  for  the  modified  Armitnge 
thresholds  for  r  —  0.  1,2  with  M  =  5  classes,  Ns  =  1  prototypes/class.  I\  —  1. 


Table  4:  Error  rate  and  average  number  of  measurements  for  the  modified 
Armitage  thresholds  with  M  =  5  classes,  Ns  =  5  prototvpes/class,  A  =  3  and 

r  =  0. 


Noise  (dBsm) 

Error 

£{»} 

30 

0.536 

9.292 

32 

0.600 

9.552 

34 

0.656 

9.760 

3G 

0.672 

10.00 

3S 

0.CS4 

10.00 

40 

0.704 

10.00 
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Table  5:  Error  rate  and  average  number  of  measurements  for  the  modified 
Armitage  thresholds  with  M  =  5  classes,  Ns  =  5  prototypes/class,  I\  =  3  and 

r  =  1. 


Noise  (dBsm) 

Error 

E{n } 

30 

0.588 

8.180 

32 

0.636 

8.548 

34 

0.648 

8.996 

36 

0.680 

9.404 

38 

0.664 

9.808 

40 

0.704 

9.992 

Table  6:  Error  rate  and  average  number  of  measurements  for  the  modified 
Armitage  thresholds  with  M  =  5  classes,  Ns  —  5  prototypes/class.  K  —  3  and 

r  =  2. 


Noise  (dBsm) 

Error 

£{n} 

j  30 

0.568 

5.964 

32 

0.628 

6.212 

34 

0.676 

6.652 

36 

0.696 

6.948 

38 

0.696 

7.364 

40 
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Figure  43:  Average  number  of  measurements  for  the  tree  structured  sequential 
test  with  M  =  5  classes.  -V„  =  6  prototypes/class.  K  =  4. 


Table  7:  Error  rate  and  average  number  of  measurements  for  the  tree  structured 
sequential  technique  with  M  =  5  classes.  .Ys  —  6  prototypes/class,  and  K  -  4. 


Noise  ( dBsm ) 

Error 

£>} 

1G 

0.002 

1.05 

19 

0.008 

1.27 

22 

0.008 

1  99 

25 

0.0GS 

3.95 

2S 
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31 

0.216 

vv 


Table  S:  Error  rate  and  average  number  of  measurements  for  the  sequential 
nearest  neighbor  technique  with  M  =  5  classes,  \s  --  C  prototvpes/class,  and 

A'  =  4, 


Noise  (dBsm) 

Error 

E{n) 

16 

0.007 

1.78 

|  18 

0.010 

2.71 

20 

0.014 

4.13 

22 

0.012 

5.88 

24 

0.012 

7.38 

26 

0.020 

8.56 

backscatter  measurements  at  8,  11,  16,  and  25  MHz.  The  error  percentages  and 
the  average  number  of  measurements  are  based  on  the  results  of  three  hundred 
experiments. 

In  Figures  46-48,  the  probability  of  classification  error  is  shown  as  a  function 
noise  power  level  for  the  sequential  nearest  neighbor  test  with  Xs  =  5(4).  and 
I\  =  3(2)  respectively.  The  average  number  of  required  measurements,  for  these 
cases  are  shown  in  Figures  47-49.  Figures  50-51  show  the  probability  of  error  and 
the  average  number  of  measurements  as  a  function  of  the  noise  power  level  for  the 
sequential  nearest  neighbor  test  using  a  single  frequency  radar  with  -Vs  =  1  (simple 
classes).  Figures  52-53  show  the  misclassification  error  and  the  average  number  of 
observations  for  the  sequential  nearest  neighbor  with  Ns  =  2  (0°.  10°)  prototypes 
and  A  =  2. 

Figures  56-57.  show  the  probability  of  error  and  the  average  number  of  oh 
serrations  as  a  function  of  the  threshold  .4,  (accepted  minimum  distance)  for  the 
sequential  nearest  neighbor  test.  These  experiments  were  run  at  a  25  (dBsm)  nobe 
power  level. 

A  comparison  between  the  sequential  nearest  neighbor  and  the  nearest  neigh 
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Table  9:  Error  rate  and  average  number  of  measurements  for  both  the  sequential 
nearest  neighbor  technique  and  the  fixed  nearest  neighbor  technique  with  M  =  5 
classes,  Ns  =  1  prototypes/class,  and  K  —  4. 


Figure  55:  Comparison  of  the  average  number  of  measurements  for  the  sequential 
nearest  neighbor  test  and  the  fixed  nearest  neighbor  test  with  M  —  5  classes, 

Ns  =  1  prototypes/class,  K  =  4. 
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bor  with  fixed  number  of  observations  is  shown  in  Figures  54-55.  Figure  54,  shows 
that  the  probability  of  error  is  almost  the  same  for  both  algorithms,  while  Figure 
55  shows  the  reduction  in  the  average  number  of  measurements  due  to  the  sequen¬ 
tial  nearest  neighbor.  Thus,  Figure  55,  can  be  considered  as  a  plot  of  the  efficiency 
of  the  sequential  nearest  neighbor.  Tabulated  results  of  this  comparison  arc  shown 
in  Table  9. 

5.4  Comparison 

A  number  of  conclusions  can  be  drawn  from  these  results.  First,  notice  that 
the  sequential  nearest  neighbor  test  produces  a  reasonably  low  error  rate,  but 
requires  a  large  average  number  of  measurements  because  of  its  nonparametric 
nature.  In  addition,  at  high  noise  power  levels,  (  >  20  dBsm)  this  test  frequently 
requires  the  maximum  number  of  measurements  (N  =  10)  before  reaching  a  deci¬ 
sion.  However,  this  test  requires  only  the  computation  of  M  x  Ns  vector  distances 
at  each  stage  of  the  sequential  test.  Moreover,  the  variation  in  the  error  rate  with 
respect  to  the  noise  power  level  is  almost  linear.  Thus,  at  very  high  noise  power 
levels,  the  sequential  nearest  neighbor  gives  low  error  rate  compared  w'ith  the  other 
parametric  algorithms. 

The  Armitage  technique  also  produces  low  error  rates  at  lower  noise  power 
levels  but  the  average  number  of  measurements  for  this  test  seems  to  be  particularly 
sensitive  to  the  noise  level,  and  it  requires  a  large  number  of  measurements  for  levels 
exceeding  20  dBsm.  The  reason  is  that  pairwise  comparison  is  very  restrictive, 
especially  at  high  noise  power  levels.  In  addition,  this  algorithm  is  relatively 
complex  because  it  requires  the  computation  of  M(M  —  l)/2  likelihood  ratios 


at  each  stage  of  the  sequential  test.  However,  as  shown  in  Figures  58-03.  this  test 
provides  the  best  (among  the  presented  algorithms)  compromise  of  the  tradeoff 


Figure  56:  Error  rates  for  the  sequential  nearest  neighbor  test  as  a  function  of 
the  minimum  distance  at  a  25  (dBsm)  noise  level  with  M  =  5  classes,  Ns  =  6 

prototypes/class,  K  =  4. 

between  the  average  number  of  observation  and  the  probability  of  error. 

The  method  due  to  Palmer  requires  the  computation  of  single  likelihood  ratio 
at  each  stage  of  the  test.  This  method  gives  low  error  rate,  compared  to  the  other 
algorithms,  at  low  noise  power  levels  (<  15  dBsm).  However,  the  performance  of 
the  Palmer  test  is  not  satisfactory  at  high  noise  power  levels  (>  20  dBsm)  since 
it  gives  higher  error  rate  than  other  algorithms  with  almost  the  same  number  of 
observations.  The  reason  for  this  is  that  a  decision  based  on  comparison  of  the 
two  largest  likelihood  functions  is  not  reliable  at  high  noise  power  levels  because 
all  of  these  functions  approach  each  other  at  such  noise  levels. 

The  Reed  algorithm  (geometric  mean  comparison)  requires  the  computation  of 
M  likelihood  ratios  only.  This  algorithm  does  not  provide  a  good  compromise  of  the 
tradeoff  between  the  error  probability  and  the  average  number  of  measurements 


Figure  57:  Average  number  of  mee  iurements  for  the  sequential  nearest  neighb 
test  as  a  function  of  the  minimum  distance  at  a  25  (dBsrn)  noise  level  with 
M  =  5  classes,  Na  —  6  prototypes/class,  K  —  4. 


especially  at  low  noise  power  levels.  The  reason  is  that  at  low  noise  levels  all 
likelihood  functions  have  small  values  except  for  that  of  the  unclasssified  target  and 
the  geometric  mean  of  all  of  the  likelihood  functions  does  not  carefully  represent 
the  likelihood  function  of  the  true  hypothesis.  However,  this  algorithm  is  less 
sensitive  to  noise  power  levels  than  the  other  parametric  algorithms  in  that  the 
average  number  of  observations  increases  smoothly  as  the  noise  power  increases. 

Finally,  Figures  42-43  imply  that  the  best  performance  is  attained  by  apply¬ 
ing  the  tree  method  suggested  in  this  study.  While  this  test  requires  more  com¬ 
putations  than  most  of  the  other  tests,  it  generally  requires  the  least  number  of 
observations  and  the  error  compares  favorably  to  that  of  the  other  tests.  The  tree 
test  gives  a  good  compromise  of  the  tradeoff  between  the  number  of  oservations 
and  the  probability  of  error.  However  the  performance  of  this  test  depends  on  the 
number  of  classes  and  the  types  of  thresholds  chosen.  In  addition,  this  algorithm 
is  not  as  complex  as  the  Armitage  test  since  it  requires  the  computation  of  one 
likelihood  ratio  at  each  stage,  and  even  fewer  computations  axe  required  as  the 
test  progresses  until  each  likelihood  function  represents  a  single  hypothesis  as  in 
Wald  test.  Moreover,  it  is  possible  to  predict  the  average  number  of  observations 
required  by  the  tree  algorithm  since  the  number  of  observations  is  predictable  at 
each  level  of  the  algorithm. 

The  sequential  maximum  a  posteriori  algorithm  proposed  in  this  study,  which 
is  a  modification  to  the  Reed  test,  reduces  the  number  of  observation  required  by 
the  Reed  algorithm  without  altering  the  error  rate  and  also  without  increasing  the 
complexity  of  the  Reed  algorithm.  In  addition,  this  test  can  be  considered  as  a 
suboptimal  solution  to  the  Bayes  sequential  test. 
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5.5  Sequential  Classifier  Operating  Characteristics  (SCOC) 


The  performance  of  each  of  the  sequential  classification  algorithms  discussed 
in  this  study  is  evaluated  according  to  its  operating  characteristics.  In  Figure's 
5S-63.  the  probability  of  misclassification  is  plotted  against  the  average  number 
of  observations  for  each  of  the  sequential  classification  algorithms.  These  curves 
are  considered  as  operating  characteristics  for  the  sequential  classifiers.  Moreover, 
these  curves  can  be  used  for  comparison  of  the  various  techniques  discussed  in 
this  study.  Thus,  a  sequential  classifier  can  be  designed,  using  these  operating 
characteristics,  such  that  it  will  require  a  certain  number  of  observations  for  an 
accepted  probability  of  error. 

Figure  63  shows  that  for  the  modified  version  of  the  Armitage  threshold  pro¬ 
posed  in  this  study,  the  best  reduction  in  the  average  number  of  observations  is 
attained  when  r  =  1. 

Figures  58-63,  show  that  the  Armitage  and  the  tree  tests  are  the  best  in  com¬ 
promising  the  tradeoff  between  the  probability  of  error  and  the  average  number 
of  measurements.  In  addition,  while  the  Armitage  technique  requires  less  ob¬ 
servations  than  the  tree  method  for  high  error  probabilities  (>  3  percent),  the 
tree  technique  requires  fewer  observations  for  very  low  probability  of  error  than 
the  Armitage  technique.  Moreover,  it  is  clear  from  these  figures  that  there  is  a 
reduction  in  the  average  number  of  observations  needed  by  sequential  methods 
compared  to  the  maximum  a  posteriori  test  (Likelihood)  with  fixed  number  of  ob¬ 
servations.  This  reduction  reaches  60  percent  at  some  noise  levels  and  for  certain 
error  probabilities.  However,  for  high  error  probability  (>  10  percent),  the  oper¬ 
ating  characteristics  of  all  the  sequential  classification  techniques  approach  each 
other.  Finally,  it  is  dear  that  the  performance  of  each  of  the  sequential  algorithms 


depends  on  the  computations  required  and  the  complexity  of  the  sequential  test. 
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Figure  59:  Comparison  of  the  sequential  classifiers  operating  characteristics  at  30 
(dBsm)  noise  level  with  M  =  5,  Nt  —  1  prototypes/class,  K  =  4. 
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Figure  62:  Comparison  of  the  sequential  classifiers  operating  characteristics  at  25 


(dBsm)  noise  level  with  M  =  5,  Na  =  2  prototypes/class,  K  =  2. 
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Figure  63:  Comparison  of  the  sequential  classifier  operating  characteristics  for 
the  Armitage  technique  at  30  (dBsm)  noise  level  with  A/  =  5.  Ns  =  2 

prototypes/class,  K  =  2. 
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CHAPTER  VI 


Conclusions 

The  simulation  results  presented  in  Chapter  V  indicate  that  sequential  hy 
pothesis  testing  techniques  may  realize  significant  advantages  for  applications  to 
radar  target  identification  problems.  The  flexibility  of  these  techniques  and  their 
ability  to  satisfy  practical  constraints  on  the  classification  error  performance  of 
radar  systems  make  the  incorporation  of  sequential  algorithms  particularly  at  tine 
tive  for  RTI. 

Sequential  methods  can  be  used  to  tune  and  direct  interrogating  radar  for 
maximum  reliability.  Moreover,  sequential  hypotheses  techniques  can  announce 
early,  low  confidence  decisions  and.  as  more  data  is  aquired.  high  confidence  deci 
sions.  Thus,  sequential  techniques  are  flexible 

By  changing  the  parameters  that  define  the  decision  legions,  we  can  altei 
the  performance  of  the  test  to  match  our  bounds  on  error  probability  and  tot 
length  (number  of  observations)  A  common  feature  of  all  the  paramedic  and 
nonparametric  sequential  algorithms  presented  in  tins  study  is  that  we  can  altei 
the  performance  of  these  techniques  by  modi.ying  the  decision  boundaries  to  mala 
them  dependent  on  the  number  of  observations 

Some  of  tile  M  ary  hypotheses  techniques  With  fixed  numbei  of  olisel  vat  loll' 
can  be  modified  mto  sequential  techniques  by  specifying  a  certain  nuil  lemon  m  tic 
hypotheses  decision  space  Thus,  the  nuiilltei  of  obsei  vat  lolis  can  ho  i  ed  Meed  wine 

S  h 


the  probability  of  error  remains  unchanged.  The  reason  is  that  when  observations 
are  taken  sequentially,  the  classification  test  might  terminate  before  requesting  all 
observations  specified  by  a  fixed  number  of  measurements  test. 

The  performance  of  the  various  sequential  classification  algorithms  depends 
on  the  noise  power  level.  Thus,  we  may  choose  to  use  more  than  one  classification 
algorithm  at  different  noise  power  levels  provided  that  an  a  priori  knowledge 
of  the  noise  power  level  is  available.  For  example,  we  may  use  the  method  due 
to  Palmer  at  low  noise  power  levels  because  of  its  simplicity  and  also  the  tree 
algorithm  at  high  noise  power  levels. 

It  is  clear  from  this  study  that  the  performance  of  any  sequential  classification 
algorithm  is  dependent  on  the  complexity  of  the  algorithm.  In  general,  the  tree 
algorithm  proposed  in  this  study,  minimizes  the  average  number  of  observations, 
with  less  complexity  than  the  pairwise  comparison  (the  Armitage  test). 

Truncation  is  a  major  factor  in  sequential  classification  algorithms  because 
the  number  of  observations  is  usually  finite.  Thus,  an  optimal  form  of  truncation 
is  necessary  to  achieve  a  good  performance.  It  is  also  clear  that  the  optimization 
of  parameters  as  the  maximum  number  of  measurements,  and  the  choice  of  the 
functional  form  of  the  various  decision  thresholds  deserves  further  investigation 

6.1  Noise  Dependent  Group  Sequential  Tests 

One  common  feature  of  all  of  the  sequential  techniques  discussed  in  this  study 
is  that  a  large  number  of  observations  is  required  at  high  noise  levels  This  is  clearly 
shown  in  Chapter  V  Whenever  an  observation  is  repeated,  hypotheses  testing  i- 
performed  at  each  stage  of  the  sequential  test 

If  a  priori  knowledge  about  the  noise  power  level  is  available,  which  i>  th< 
case  foi  parametric  techniques,  then  the  following  modifications  can  be  applied  t<i 


any  of  the  sequential  tests  discussed. 


1.  The  maximum  allowed  number  of  observations  N  is  chosen  as  a  function  of 
the  noise  power  a* .  That  is,  N  is  proportional  to  the  noise  power. 

2.  Groups  of  observations,  rather  than  single  observation,  are  repeated  one  at 
a  time,  observation.  The  size  of  these  groups  is  dependent  on  both  the  noise 
power  and  the  stage  of  the  test.  For  example  if  a  group  of  4  measurements 
is  observed  at  the  hcgining,  the  size  of  the  second  group  could  lx*  2  or  1. 

It  is  obvious  that  the  above  modification  reduces  the  complexity  of  the  se 
quential  classification  algorithms  because  hypotheses  testing  is  performed  after 
requesting  a  group  of  observations  rather  than  single  observation  at  each  stage  of 
t  he  sequent  ini  test . 
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